Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔤 Character Classification
Unicode Processing, Character Sets, Text Parsing, SMT Applications
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
24033
posts in
20.2
ms
LombardoGraphia
: Automatic Classification of Lombard
Orthography
Variants
📝
Punctuation Parsing
arxiv.org
·
2d
·
…
An
Introduction
to Writing Systems and
Unicode
🌏
Character Sets
r12a.github.io
·
4d
·
Hacker News
·
…
On the classification of
questioned
PDF documents —
Attributing
PDF documents to the tools that created them
📄
PDF Archaeology
sciencedirect.com
·
7h
·
…
Web Neural Network API
📊
Quantization
w3.org
·
12h
·
Hacker News
·
…
fadere/redaction-machine
: a jupyter notebook for
redacting
words from images
📜
Manuscript Digitization
github.com
·
1d
·
Hacker News
·
…
How I accidentally made the fastest C#
CSV
parser
🇨🇳
Chinese Computing
bepis.io
·
6d
·
Hacker News
,
r/programming
·
…
Batch
Distillation
Data for Developing Machine Learning Anomaly Detection Methods
🧠
Machine Learning
nature.com
·
2d
·
…
MACHINE LEARNING
🧠
Machine Learning
zackmendel.medium.com
·
2d
·
…
Show HN:
PyNear
– exact and approximate KNN, faster than
Faiss
🚀
SIMD Text Processing
news.ycombinator.com
·
4d
·
Hacker News
·
…
RYS
Part 3: LLMs think in
geometry
, not language — new results across 4 models, including code and math
🛠
Language Design
dnhkng.github.io
·
6d
·
Hacker News
,
r/LocalLLaMA
·
…
Efficient Domain
Adaptation
for Text Line Recognition via
Decoupled
Language Models
📝
Punctuation Parsing
arxiv.org
·
2d
·
…
pablocael/pynear
: A python library for efficient KNN search within metric spaces using multiple distance functions.
🗂️
Vector Search
github.com
·
5d
·
Hacker News
,
Hacker News
·
…
Stringological
sequence prediction I: efficient algorithms for predicting highly
repetitive
sequences
🔍
RegEx Engines
arxiv.org
·
2d
·
…
IvanRolero/LightOCRX
: Lightweight OCR software inference using Onnxruntime and Paddle Models
👁️
Constructive OCR
github.com
·
3d
·
r/cpp
·
…
JaWildText
: A
Benchmark
for Vision-Language Models on Japanese Scene Text Understanding
🤖
Advanced OCR
arxiv.org
·
2d
·
…
Text Data
Integration
📋
Document Grammars
arxiv.org
·
2d
·
…
Show HN:
UBPE
– a universal BPE tokenizer, optimized and
rethought
📐
Binary Grammars
github.com
·
5d
·
Hacker News
·
…
Quid
est
VERITAS
? A Modular Framework for Archival Document Analysis
🏰
Medieval Parsing
arxiv.org
·
2d
·
…
MDPBench
: A Benchmark for Multilingual Document
Parsing
in Real-World Scenarios
⚙️
Compression Benchmarking
arxiv.org
·
2d
·
…
A
Boltzmann-machine-enhanced
Transformer For DNA
Sequence
Classification
🧠
Machine Learning
arxiv.org
·
3d
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help